An easily implemented method for abbreviation expansion for the medical domain in Japanese text. A preliminary study.

نویسندگان

  • E Y Shinohara
  • E Aramaki
  • T Imai
  • Y Miura
  • M Tonoike
  • T Ohkuma
  • H Masuichi
  • K Ohe
چکیده

BACKGROUND One of the barriers for the effective use of computerized health-care related text is the ambiguity of abbreviations. To date, the task of disambiguating abbreviations has been treated as a classification task based on surrounding words. Application of this framework for languages that have no word boundaries requires pre-processing to segment a sentence into separate word sequences. While the segmentation processing is often a source of problem, it is unknown whether word information is really requisite for abbreviation expansion. OBJECTIVES The present study examined and compared abbreviation expansion methods with and without the incorporation of word information as a preliminary study. METHODS We implemented two abbreviation expansion methods: 1) a morpheme-based method that relied on word information and therefore required pre-processing, and 2) a character-based method that relied on simple character information. We compared the expansion accuracies for these two methods using eight medical abbreviations. Experimental data were automatically built as a pseudo-annotated corpus using the Internet. RESULTS As a result of the experiment, accuracies for the character-based method were from 0.890 to 0.942 while accuracies for the morpheme-based method were from 0.796 to 0.932. The character-based method significantly outperformed the morpheme-based method for three of the eight abbreviations (p < 0.05). For the remaining five abbreviations, no significant differences were found between the two methods. CONCLUSIONS Character information may be a good alternative in terms of simplicity to morphological information for abbreviation expansion in English medical abbreviations appeared in Japanese texts on the Internet.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Easily Implemented Method for Abbreviation Expansion for the Medical Domain in Japanese Text

E. Y. Shinohara1; E. Aramaki2; T. Imai3; Y. Miura4; M. Tonoike4; T. Ohkuma4; H. Masuichi4; K. Ohe1,5 1Department of Planning, Information and Management, The University of Tokyo Hospital, Tokyo, Japan; 2Center for Knowledge Structuring, The University of Tokyo, Tokyo, Japan; 3Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan;...

متن کامل

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

Automatic expansion of abbreviations by using context and character information

Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain-specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation ex...

متن کامل

An Approximate Method for System of Nonlinear Volterra Integro-Differential Equations with Variable Coefficients

In this paper, we apply the differential transform (DT) method for finding approximate solution of the system of linear and nonlinear Volterra integro-differential equations with variable coefficients, especially of higher order. We also obtain an error bound for the approximate solution. Since, in this method the coefficients of Taylor series expansion of solution is obtained by a recurrence r...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Methods of information in medicine

دوره 52 1  شماره 

صفحات  -

تاریخ انتشار 2013